Thematically Reinforced Explicit Semantic Analysis
نویسندگان
چکیده
We present an extended, thematically reinforced version of Gabrilovich and Markovitch’s Explicit Semantic Analysis (ESA), where we obtain thematic information through the category structure of Wikipedia. For this we first define a notion of categorical tfidf which measures the relevance of terms in categories. Using this measure as a weight we calculate a maximal spanning tree of the Wikipedia corpus considered as a directed graph of pages and categories. This tree provides us with a unique path of “most related categories” between each page and the top of the hierarchy. We reinforce tfidf of words in a page by aggregating it with categorical tfidfs of the nodes of these paths, and define a thematically reinforced ESA semantic relatedness measure which is more robust than standard ESA and less sensitive to noise caused by out-of-context words. We apply our method to the French Wikipedia corpus, evaluate it through a text classification on a 37.5 MB corpus of 20 French newsgroups and obtain a precision increase of 9–10% compared with standard ESA.
منابع مشابه
Finite Element Analysis of Low Velocity Impact on Carbon Fibers/Carbon Nanotubes Reinforced Polymer Composites
An effort is made to gain insight on the effect of carbon nanotubes (CNTs) on the impact response of carbon fiber reinforced composites (CFRs) under low velocity impact. Certain amount of CNTs could lead improvements in mechanical properties of composites. In the present investigation, ABAQUS/Explicit finite element code (FEM) is employed to investigate various damages modes of nano composites ...
متن کاملPresentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملPresentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملAn Investigation of Semantic Cluster Helps Listening Comprehension of English Learners: A Case-study in Pass College
This paper introduces Daneman and Carpenter’s test of working memory span, and taking the use of Chinese language materials collected by Chinese Language Education Research Center’s bilingual corpus does a 3 week’s experiment with both English majors and non-English majors in Pass College of CTBU (Chongqing Technology and Business University). The experimental group has 10 minutes vocabulary cl...
متن کاملExplicit vs. Contrastive-based Instruction of Formulaic Expressions in Developing EFL Learners’ Reading Ability
As an integrative component of textual structure, formulaic expressions (FEs) play a key role in communicating the message and comprehending the text. Furthermore, interlingually contrastive features of FEs add to their both significance and complexity of their instruction. Given these facts, this study was an attempt to explore a sound mechanism on how to teach FEs; whether an explicit or CA-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1405.4364 شماره
صفحات -
تاریخ انتشار 2013